The verbal valency in the Prague Dependency Treebank from the annotator's point of view
نویسنده
چکیده
The core ingredient of the Prague Dependency Treebank (PDT; see Hajič, this volume) „valency“ indicates the capability of lexical units to combine other complementations. The PDT has adopted the concept of the valency theory of the Functional Generative Description (FGD) (see Sgall, 1967, Sgall et al, 1986). The valency theory of the FGD has first been developed for verbs, then also for other parts of speech. We present a description of how we dealt with valency of verbs during the annotation of the PDT and the way how the verbal part of the valency lexicon (PDT-VALLEX) was built. We focus on some specific problems related to verbal valency (as well as some other verbal complementations) from the point of view of the PDT. 1. The concept of valency in PDT One of the prerequisites of the correct syntactic annotation at the tectogrammatical level (TR) of the Prague Dependency Treebank (see Hajič, this volume) is the knowledge of valency frames. The valency theory (see Panevová, 1974-75, 1980, 1994, 1999) as used in the process of annotation of the Prague Dependency Treebank (PDT) corresponds to the concepts of the Functional Generative Description (FGP) (see Sgall, 1967, Sgall et al, 1986). Within this approach, syntactic as well as semantic criteria are used to identify verbal complementations. The verb is considered to be the core of the sentence (or clause, as the case may be). Its complementations (dependents) are classified either as inner participants or as free modifications. Both types of verbal complementations can be either obligatory (semantically always present with a given verb) or optional (not necessarily present). Only inner participants (obligatory or optional) and obligatory free modifications belong to the verbal valency frame. Optional free modifications are not listed in the valency frame. The relation between the dependent and its governor at the TR is labeled by a functor. The functor must be determined and recorded for all complementations in the actual process of data annotation. Annotators choose this value from a set of functors listed in the manual for tectogrammatical 1 Neither are so called quasi-valency and typical complementations stored in the valency frames of the PDTVALLEX lexicon (these types of complementations are described by Lopatková et al.(2003), Panevová (2003). 2 We discuss here the verbal valency frame in a narrow, strict sense, i.e. the verbal valency frame captured in the lexicon. The verbal valency frame in a broader sense consists of all. complementations, which can expand the given verb. The types of all the complementations are captured in the structure of the annotated tree as some of the values of the dependent nodes. 3 Valency is considered also for many nouns and adjectives, see Řezníčková,V. (2003), Hajič et. (2003). 4 If the annotators hesitate among the correct value of the functor, they have the choice to mark this uncertainty through multiple selection of several functors.
منابع مشابه
Valency in the Prague Dependency Treebank: Building the Valency Lexicon
In this article we focus on valency, which belongs to the core phenomena being captured in the underlying level of the Prague Dependency Treebank (PDT). We present a summary of the basic principles of the applied theoretical framework including proposals for suitable refinement relevant to NLP. The current status of description of valency behavior of verbs, nouns and adjectives is outlined. We ...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملCzEngVallex: Mapping Valency between Languages
This report presents a guideline for building a resource connected with the project of interlinking Czech and English verbal translational equivalents, based on a parallel, richly annotated dependency treebank containing also valency and semantic roles, namely the parallel Prague CzechEnglish Dependency Treebank. One of the main aims of this project is to create a high-quality and relatively la...
متن کاملCzech-English Bilingual Valency Lexicon Online
We describe CzEngVallex, a bilingual Czech–English valency lexicon which aligns verbal valency frames and their arguments. It is based on a parallel Czech-English corpus, the Prague Czech-English Dependency Treebank (PCEDT), where for each occurrence of a verb, a reference to the underlying Czech and English valency lexicons (PDT-Vallex and CzEngVallex, respectively) is recorded. The CzEngValle...
متن کاملBilingual English-Czech Valency Lexicon Linked to a Parallel Corpus
This paper presents a resource and the associated annotation process used in a project of interlinking Czech and English verbal translational equivalents based on a parallel, richly annotated dependency treebank containing also valency and semantic roles, namely the Prague Czech-English Dependency Treebank. One of the main aims of this project is to create a high-quality and relatively large em...
متن کامل